Sample-efficient Policy Optimization with Stein Control Variate
نویسندگان
چکیده
Policy gradient methods have achieved remarkable successes in solving challenging reinforcement learning problems. However, it still often suffers from the large variance issue on policy gradient estimation, which leads to poor sample efficiency during training. In this work, we propose a control variate method to effectively reduce variance for policy gradient methods. Motivated by the Stein’s identity, our method extends the previous control variate methods used in REINFORCE and advantage actor-critic by introducing more general action-dependent baseline functions. Empirical studies show that our method essentially improves the sample efficiency of the state-of-the-art policy gradient approaches.
منابع مشابه
Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic
Model-free deep reinforcement learning (RL) methods have been successful in a wide variety of simulated domains. However, a major obstacle facing deep RL in the real world is the high sample complexity of such methods. Unbiased batch policy-gradient methods offer stable learning, but at the cost of high variance, which often requires large batches, while TD-style methods, such as off-policy act...
متن کاملAction-depedent Control Variates for Policy Optimization via Stein's Identity
Policy gradient methods have achieved remarkable successes in solving challenging reinforcement learning problems. However, it still often suffers from the large variance issue on policy gradient estimation, which leads to poor sample efficiency during training. In this work, we propose a control variate method to effectively reduce variance for policy gradient methods. Motivated by the Stein’s...
متن کاملPredictive Matrix-Variate t Models
It is becoming increasingly important to learn from a partially-observed random matrix and predict its missing elements. We assume that the entire matrix is a single sample drawn from a matrix-variate t distribution and suggest a matrixvariate tmodel (MVTM) to predict those missing elements. We show that MVTM generalizes a range of known probabilistic models, and automatically performs model se...
متن کاملVariance analysis of control variate technique and applications in Asian option pricing
This paper presents an analytical view of variance reduction by control variate technique for pricing arithmetic Asian options as a financial derivatives. In this paper, the effect of correlation between two random variables is shown. We propose an efficient method for choose suitable control in pricing arithmetic Asian options based on the control variates (CV). The numerical experiment shows ...
متن کاملStein Variational Policy Gradient
Policy gradient methods have been successfully applied to many complex reinforcement learning problems. However, policy gradient methods suffer from high variance, slow convergence, and inefficient exploration. In this work, we introduce a maximum entropy policy optimization framework which explicitly encourages parameter exploration, and show that this framework can be reduced to a Bayesian in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1710.11198 شماره
صفحات -
تاریخ انتشار 2017